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A discrete-time random process is described which can generate bursty sequences of events. A 
Bernoulli process, where the probability of an event occurring at time t is given by a fixed probability 
X, is modified to include a memory effect where the event probability is increased proportionally to 
the number of events which occurred within a given amount of time preceding t. For small values of 
X the inter-event time distribution follows a power-law with exponent —2 — x. We consider a dynamic 
network where each node forms, and breaks connections according to this process. The value of x for 
each node depends on the fitness distribution, p{x), from which it is drawn; we find exact solutions 
for the expectation of the degree distribution for a variety of possible fitness distributions, and for 
both cases where the memory effect either is, or is not present. This work can potentially lead to 
methods to uncover hidden fitness distributions from fast changing, temporal network data such as 
online social communications and fMRI scans. 


PACS numbers: 64.60.aq 89.65.-s 


The mathematics of interactive complex systems has a 
vital role to play in the interpretation of large-scale so¬ 
cial and biological data. Technology which facilitates the 
collection of vast amounts of information is increasingly 
becoming available for both academic and commercial 
purposes; however, in the absence of a detailed under¬ 
standing of the underlying processes, there will always 
be a risk of deriving the wrong conclusion from the facts. 
Complexity science provides numerous models of social, 
biological, physical and economic systems which combine 
large numbers of individual components to reproduce the 
types of behaviour observed on the systemic level. The 
components in such systems are usually uninteresting in 
isolation, but when allowed to interact with each other 
they produce complex non-trivial patterns which in some 
cases agree very well with empirical results. This poses 
a challenge for data scientists: given information only 
about the system as a whole, with all its complex and 
interactive dynamics, how can one conclude anything 
about the individual components? To begin answering 
that question we need to understand, in mathematical 
terms, the form and extent of the biases that complexity 
creates. 

The purpose of the present work is to provide an under¬ 
standing of how one very simple mechanism, a memory 
effect (brought about by interaction), will bias the statis¬ 
tical properties of a complex system such as the distribu¬ 
tion of communication activity in a social network, or the 
distribution of brain activity of different cortical regions 
in a fMRI scan. We consider a hypothetical system of 
individual agents (nodes) and the instantaneous pairwise 
interactions which happen between them (edges). By ag¬ 
gregating all of the interactions that occur within some 
given time window, a network is formed whose structure 
can be analysed for a deeper understanding of the system. 
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In general, the length of this time window determines the 
density of the network; as an increasing amount of data is 
aggregated, a picture of the system emerges which shows 
not only whether or not two nodes are connected, but 
also includes the strength of their relationship through 
the frequency of their interactions. 

Throughout this paper we will be comparing two pos¬ 
sible forms of stochastic process: Markovian and non- 
Markovian. In the non-Markovian case the rate of activ¬ 
ity of the individual agents in the system is proportional 
to the number of events which the agent can ‘remember’; 
these are events which happened at earlier times and are 
stored in a memory of a given fixed size. We will refer 
to this as the “memory effect”. When a large number 
of interacting agents are considered, the memory of an 
individual is recorded in the structure of the network of 
interactions. Specifically, the number of interactions a 
node can ‘remember’ is effectively the same as its degree, 
this way the mechanism for creating links in the non- 
Markovian network model is a form of linear preferential 
attachment [l|. Likewise, the process of ‘forgetting’ is an 
edge deletion mechanism Q. 


I. RELATED WORK 

The motivation for this work is the growing evidence of 
memory dependent, burst-like activity in complex inter¬ 
active systems. Recently this has been most prominent in 
the study of online communication patterns [3| . Evidence 
for burstiness is found in the distribution of inter-event 
times between actions; in a Poisson process, for example, 
which is Markovian i.e. memory less, the inter-event times 
follows an exponential distribution; periodic events, such 
as a heartbeat, have inter-event times which generally 
stay close to the mean; and lastly, in systems which are 
generally said to be “bursty”, the inter-event times follow 
a power-law distribution. A formal definition has been 
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proposed to quantify these behaviours in Q. A slightly 
different approach in Q identifies a “burst” as a sequence 
of events where each event follows the previous one within 
a given time interval. This definition naturally leads to 
the consideration of two possible types of event: those 
which happen spontaneously, and those which occur as 
reactions to previous events (e.g. the dynamics of human 
conversations). 

The current explanations for why a sequence of events 
might have a power-law inter-event time distribution rely 
on a memory effect; in other words the probability of an 
event occurring at a given time is dependent on events 
which occurred at previous times. Models have been pro¬ 
posed based on queuing theory where incoming messages 
are replied to according to some prioritisation strategy 
P-Q. By adjusting a parameter which controls the ran¬ 
domness of the strategy, these models have been shown 
to create power-law distributed inter-event times with ex¬ 
ponents that agree with a number of real-world data-sets. 

Many of of the systems in which burstiness has been 
observed cannot be considered to have the internal 
mechanisms of a queuing model, these include studies 
of the human brain Q, animal movement patterns 0 
and consumer behaviour [111. Bursts of activity closely 
resemble cascading events such as avalanches and mass 
extinctions and therefore might possibly be examples of 
self-organised criticality (SOC) [l^, where the focus is 
on the emergence of scale-free distributions based on 
very few assumptions about the system. In fact, the Bak- 
Sneppen model M , one of the fundamental examples of 
SOC, is known to have a power-law distribution of inter¬ 
event times [l^ . Much of the analytical progress made in 
the study of bursts has come from related models [iM3. 


The present work examines how the bursty behaviour 
of individual interacting agents affects the large scale 
macroscopic view of the system. We consider activity 
on a dynamic network which is closely related to sev- 
eral previously studied models: Preferential attachment 
[ij, [ 2 ^ is a non-Markovian method by which many net¬ 
works grow. In this process, nodes are added to the net¬ 
work sequentially in discrete time-steps and edges are 
created between the new node and old nodes selected 
randomly but with probability proportional to their de¬ 
gree. The rate of growth in connectivity of a node at any 
given time therefore depends on its entire history. Con¬ 
versely in “fitness” networks the connectivity of a node 
accords only to an attractiveness value drawn from some 
probability distribution pH ] . Such models are versatile in 
their applications as they can incorporate various topo¬ 
logical network features such as clustering, and have also 
allowed complex network topologies to be incorporated 
into SOC models [l^] (we note here that a significant 
proportion of the fitness network literature concerns cor¬ 
relations between the fitness of connected nodes, while 
the present work concerns only uncorrelated networks). 
A simple way to combine fitness and preferential attach¬ 
ment has been achieved by defining the attractiveness of 


a node to be either the sum or product (or a combination 
of both) of its degree and its intrinsic fitness psI - Esj . 

The networks mentioned so far are static in the sense 
that once a link is created between nodes it remains in 
that location forever. In many situations this is not the 
case and we here use the term “dynamic” to refer to 
networks whose edges can be removed as new ones are 
created [1, [2614^ . The model introduced in com¬ 
bines the preferential selection of nodes with an added 
fitness parameter (the same for every node) on a dy¬ 
namic network where edges are removed so that the total 
number of edges remains constant. The authors focus 
on the problem of finding the degree distribution; what 
they do not mention is that the degree of each individual 
node in this model fluctuates with a memory-driven 
bursty process with power-law distributed inter-event 
times between each attachment event. The present work 
provides a mathematical description of this behaviour. 
Additionally, by incorporating heterogeneous fitness 
distributions into the previous model, we will describe a 
class of complex networks which exhibit a rich variety of 
structural and time-dependent properties. 

The model presented in this paper is a versatile and 
applicable dynamic network with varying node fitness. 
There is currently research activity in related areas that 
is of much interest: time varying networks, in particu¬ 
lar, have some similarities with dynamic networks. Infor¬ 
mally speaking, these are multi-layered networks where 
each layer corresponds to a distinct time interval, they 
differ from dynamic networks because at the end of each 
time interval the entire network (rather than just a sin¬ 
gle edge) is removed and replaced [13, HIl. In its most 
basic form the “action potential” (the propensity to act 
at any given time) of each node does not change with 
time. In memory effects are considered within the 
time varying formulation. The authors observe in social 
communication data a universal rule for the probability 
that an individual will continue an old correspondence 
rather than start a new one. Adding this constraint to 
the original time-varying concept gives accurate results 
regarding the number of contacts and the weight of cor¬ 
respondence with each contact. 

In [sljl the waiting time distribution between actions 
takes an arbitrary form, thus the action potential of each 
node may vary. When the waiting time takes a power- 
law form they find that the exponent of the degree dis¬ 
tribution depends on the exponent of the waiting time 
distribution. Similarly, in the model introduced in [s^, 
the rates at which new links are formed and broken, and 
the rate at which old links become active, depend on the 
probability distribution of inter-event times. The authors 
choose to examine the power-law inter-event time distri¬ 
bution, commenting that this is akin to a preferential 
attachment mechanism. Unlike the present work how¬ 
ever, the power-law is an assumption and not an emer¬ 
gent property. 

The aforementioned studies do not contradict the 
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work presented here, these papers are complimentary. 
Together they reinforce the movement to unify bursty 
dynamics and network structure. 

The remainder of this paper is structured as follows: 
We describe a process for generating a sequence of events 
which, under certain circumstances, produces a power- 
law inter-event time distribution. In the section which 
follows, we introduce a model of an evolving network 
where edges are removed and replaced at each time- 
step. Within this section two possible attachment kernels 
are described, the first is entirely fitness based, the sec¬ 
ond has an additional preferential attachment mechanism 
which can be interpreted as an increased propensity to 
act caused by previous interactions. We show that in the 
latter case, the activity of the nodes is described by the 
random process of Section IE Results are presented and 
we present figures which show the degree distributions in 
some special cases of the model. In Section m we high¬ 
light the advances achieved by this research. We then dis¬ 
cuss briefly its possible applications and elements which 
require further study. In Appendix the solution for 
the inter-event time distribution is shown. In Appendix 
[5] we show how the network is described mathematically, 
and derive results regarding the degree distribution for a 
general fitness distribution. In Appendix [C] we look at 
some special cases of the fitness distribution. 


II. GENERATING EVENT SEQUENCES WITH 
POWER-LAW DISTRIBUTED INTER-EVENT 
TIMES 


Before discussing the network topology of a population 
of interacting agents, we first examine a process which de¬ 
scribes the memory dependent behaviour of an individ¬ 
ual node. We describe a discrete-time stochastic process 
which generates an infinite sequence of binary random 
variables Ai, A 2 ,.... At time t we may have At = 1, 
which we consider to be an ‘event’, or Af = 0 , which we 
consider to be a moment of inactivity. The system has 
a memory capacity of size M, meaning that there are M 
locations, mi{t),m 2 {t), where an event may 

be stored, i.e. mn{t) € 0,1 for n G 1,2,...M. We define 
kt = iTT-nit) and let the event probability kernel / 

be any function such that 0 < f{kt) < I for kt € 0,..., M. 
We consider two possible ways, random and age-based, 
in which events can be deleted from the memory. 


A. Randomised memory 


At time t, 

1. With probability f{kt), Xf = 1. With probability 
l-f{kt),Xt = 0. 


2. Integer n' is selected uniformly at random from 
1,2, ...,M and -I- 1) = Aj. For all other 

n ^ n', mn(t + 1 ) = mn(t). 

Since there is always a non-zero probability that At = 1 
(and similarly that Af = 0 ) the process will continue in¬ 
definitely without ever reaching an absorbing state. For 
example, if on the contrary f{kt) = kt/M, and we start 
from an initial state where fco 7 ^ 0 , then we will even¬ 
tually end up in one of two states: either fcf = M or 
kt = 0. Analysis of this particular case is important to 
evolutionary biology (35j |. We find that by eliminating 
the possibility of absorption the statistical properties of 
the sequence can be calculated in the t —7 00 limit. 


B. Age-based memory 

The randomised memory process is approximately 
equivalent to the following alternative description: In 
each iteration we perform step I as before. We then 
set mM{t -I- 1 ) = Af and TO„(t -f 1 ) = for 

1 > n > M — 1. This way i will ‘remember’ all the 
events which happened in the previous M iterations. For 
example. 


... OOOOOlOlOlOOWAf. 

M 

Assuming that the value of mi is not correlated with 
the value of kt, i.e. the probability of removing a I is 
well approximated by kt/M, then the solutions given in 
Appendix 1^ are applicable in both cases. 
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FIG. 1. The inter-event time distribution for a sequence of 
events generated by the process described in Section fll Al The 
simulation lasted for 10® iterations with M = 10® and e = 1. 
The markers show the log-binned frequencies (normalised to 
give the proportion of inter-event times of length r). The 
dotted lines show the corresponding slope predicted by Eq. ©• 
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We are interested in IIt-, the probability that the inter¬ 
event time of a randomly selected pair of consecutive 
events will be exactly r. Our analysis focuses on the 
linear probability kernel 


fih) 


kt + x 
M + X + e 


( 1 ) 


where x and e are real positive numbers. As x increases, 
the contribution from the memory factor k becomes less 
important and the system approaches a Bernoulli pro¬ 
cess. At the other extreme, when x is small relative to 
M, the inter-event time distribution asymptotically fol¬ 
lows a power-law. In Appendix [A] we derive an approx¬ 
imate solution to n,- showing that the exponent of the 
power-law is independent of the parameters M and e, but 
is dependent on the choice of x in the following way 

( 2 ) 


Numerical results are presented in Fig. o for a range of 
values of x. The deviation away from a power-law that 
is present in the very large values of r can be attributed 
to the fact that once the waiting time reaches such high 
values, it becomes overwhelmingly likely that k = 0, 
meaning that memory effects are null. 


Section IIII Bl concerns a network of agents who create 
edges with other nodes dynamically according to a prefer¬ 
ential attachment process, and destroy edges either ran¬ 
domly or according to their age. After introducing the 
network model we will show that its parameters can be 
equated with the parameters of the stochastic process 
described in this section. 


III. DYNAMIC NETWORK MODEL 

We consider a network formed of N nodes and E edges. 
Initially the edges are placed between pairs of randomly 
selected nodes. For each node, a positive continuous ran¬ 
dom variable a; G R is selected according to a probability 
density function p(x) which has mean {x). Following the 
related literature we shall refer to this value as the “fit¬ 
ness” of the node, denoted Xi for the node i. The degree 
ki is the total number edges adjacent to i (note that this 
not the same as the number of neighbours of i since mul¬ 
tiple edges can exist between any pair of nodes). The 
dynamics of the system are described as follows: in each 
iteration, a node i is randomly selected with probability 
given by its attachment kernel n(f), a second node is se¬ 
lected in the same way and an edge is created between 
them. In the same iteration the oldest edge is removed 
(thus E, N and the mean degree (k) = 2E/N remain 
constant throughout). Alternatively we could choose to 
remove a randomly selected edge instead of the oldest, 
these two possibilities correspond to the randomised and 
age-based forms of the processes described in III Al and 
III Bl respectively: the results presented here are applica¬ 
ble to both. Under these rules, the probability that an 


edge will be created between two nodes i and j is pro¬ 
portional to the product of their fitness n(i)n(j). This 
is just one of many ways to combine the fitness of two 
nodes; a wealth of literature exists examining the other 
possibilities and generalisations (see for example [^ . [3^ 1. 
The process is illustrated in Fig.([2|) for both of the at¬ 
tachment kernels considered here. 


(a) (b) 



FIG. 2. Three iterations of the network model starting from 
a random initial configuration. The number of stripes inside 
each node corresponds to the fitness, the node with the least 
stripes has fitness x = 0.5, the others have x = 1, x = 1.5, 
X = 2, X — 2.5 and a: = 3. The number of dashes in each 
edge corresponds to its age with the oldest having the most 
dashes. In each iteration the oldest edge is removed and a new 
edge is added between nodes selected either with probability 
proportional to their fitness (shown in (a)) or with probabil¬ 
ity proportional to the sum of their fitness and their degree 
(shown in (b)). Note that in (a) the fittest node is also the 
most active. In (b) this is not the case. 


In most real-life situations the fitness of a node repre¬ 
sents some hidden (or latent) quantity, whereas its degree 
represents something tangible that appears in empirical 
data-sets. In general, then, an important problem to ad¬ 
dress is in inferring the fitness of the node when given 
only its degree and other properties describing the struc¬ 
ture of the network. In a stochastic system the closest we 
can get to achieving this is finding the probability that a 
node has fitness x given some information about the net- 
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work structure. When the available information is the 
degree of each node, Bayes rule gives the appropriate ex¬ 
pression for this quantity: 


P(x|t) = 

Pk 


( 3 ) 


where pk is the probability of randomly selecting a node 
which has degree k, and P{k\x) is the same probability 
but this time conditioned on x. Thus there is an incentive 
to extract these quantities; as well as being interesting in 
their own right; they are integral to uncovering the hid¬ 
den variables. The analysis in this section focuses mainly 
on deriving the degree distribution and the conditional 
degree distribution for a range of fitness functions. We 
consider the two following possible attachment kernels. 



Inter-event time i: 


A. Dynamic model without memory 


The probability of attaching one end of an edge to a 
node i of fitness Xi is 


n(*) 


Xj 


Xi 

N{x) 


( 4 ) 


Under this condition the Xi can be considered the rate 
of activity of i and one might naively assume that the 
relationship between Xi and the degree of i, ki, is ap¬ 
proximately linear (specifically ki k. Xi x {k)/{x) since 
this would give the correct result for the total degree of 
the network). In general, this is not the case; Figures (ISI) 
and (HH) show that the degree distributions and fitness 
distributions of networks created by this process after a 
large number of time-steps contain fundamental differ¬ 
ences. If p{x) = Xe~^^ then the degree distribution de¬ 
pends only on the mean degree of the network and not at 
all on the parameter A. In this case there are therefore in¬ 
finitely many possible fitness distributions which produce 
the same degree distribution. If p(x) follows a power-law 
with exponent 7 then pk will have a power-law tail with 
the same exponent 7 , small values of k, however, become 
increasingly uncommon as we look at denser networks. 


B. Dynamic model with memory 


The probability of attaching one end of an edge to a 
node i of fitness xt is 


n(*) = 


ki 


Y.Akj+Xj) N{{k) + {x)) 


( 5 ) 


where ki is the degree of i. Memory in this system is 
recorded by the edges, as the current degree influences 
the creation of future edges. Because the edges in this 
system are dynamic, in that the oldest one is removed 
with each iteration while new ones are added, each node 
effectively has a memory which extends backward in time 


FIG. 3. (Colour online) The inter-event time distribution for 
attachment events of a single node i with fitness equal to 
the mean of the fitness distribution [xi = {x)) on Log-Log 
axes (main), and Log-Linear axes (inset). The plotted results 
consider a dense network where N = 10 and E = 100. The 
attachment events of i are described by the process introduced 
in Section|n]with M=E and t = (A/2 — l)(a;). When Xi, and 
hence N{x), are very large, the event probability is dominated 
by the contribution from Xi and is therefore weakly dependent 
on the memory of i. In this case the inter-event times are 
distributed exponentially (as expected in a Bernoulli process). 
As {x) —>■ 0 the contribution from the memory of previous 
events becomes dominant and the distribution approaches a 
power-law with exponent —2. 


to the age of the oldest edge. Let us now consider the 
relationship between this attachment kernel and the pro¬ 
cess described in Section m If we consider the attach¬ 
ment of the end of an edge to the node i to be an ‘event’, 
then the probability of an event occuring at time t is 
given by Eq.® multiplied by 2 (corresponding to the 
two ends of the edge). Additionally, the event will be 
deleted from memory after precisely E iterations, so the 
number of edges corresponds to the length of the mem¬ 
ory i.e. M = E. Therefore, when e is chosen such that 
Xi+e = N{x)/2, Eq.([T]) and 2 times Eq.® become identi¬ 
cal and we conclude that the results of Section |ll] apply to 
the sequence of attachment events for individual nodes in 
this model. It is possible then, by choosing a fitness dis¬ 
tribution which ensures that (x) << (fc), to create burst 
like patterns of behaviour in the activity of the node i. 
As we deviate away from these conditions the activity 
of the nodes is better described by a Bernoulli process, 
giving exponentially distributed inter-event times, seen 
in Fig.Q. 

Results for the degree distribution are plotted in 
Figs. (03) and IMI. We find that in the case where the 
fitness follows a gamma distribution, pk approaches a 
power-law with exponent —1 as the mean fitness (x) 0 . 

In this limit, the heterogeneity in the degree distribution 
can be explained entirely by the fluctuations seen in indi- 
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FIG. 4. (Colour online) The degree distributions for both types of attachment kernel and two different forms of p{x). In each 
plot the markers show the results of a single numerical simulation of a network of 2 x 10 ^ nodes, the smooth lines show the 
corresponding analytical results. In Mall E = A x 10^ and the fitness distribution is p{x) = which is special case of 

the gamma distribution fEg. dCll) with a = 1 and j3 = 1/A]. For a range of values of A, an exponential fitness distribution is 
plotted with filled markers and the degree distribution is plotted with unfilled markers of the same shape. We see that in this 
particular case the parameter A does not effect the result. This is not the case when memory effects are introduced, shown 
in (03; as A increases the degree distribution approaches a power-law [see Ea dCSIl ]. In (H3 the power-law fitness distribution 
[given by Eg. dCQII with Xmin = 1 and 7 = 2.5] is plotted next to the degree distributions for a range of values of E (giving 
different densities). For large values of k the degree distribution has the same power-law exponent as the fitness distribution, 
even when memory effects are introduced as we see in (|4dl). The effect of including memory is however seen in the small values 
of k. 


vidual nodes, given by Ea. (|AIIIl : the fact that one node 
might have a greater fitness than another node becomes 
irrelevant. This however is not the case when the fit¬ 
ness distribution follows a power-law. If p{x) follows a 
power-law with exponent 7 then pk will have a power-law 
tail with the same exponent 7 . The effect of introducing 
memory is seen mostly in the smallest values of k which, 
in contrast to the memoryless case, remain relatively fre¬ 
quent, even in dense networks. We conclude then, that 
while both gamma and power-law distributions have tails 
which extend to infinity, as (x) 0 the effect of memory 

dominates over the fitness distribution in the first case, 
and the fitness distribution dominates over the effect of 


memory in the second. 

Given the degree distribution from a system whose be¬ 
havior meets the description of the model, our analysis 
suggests a method to infer the hidden fitness distribution 
numerically by assuming it takes the form of a step func¬ 
tion, reducing the problem to an optimisation problem 
given in Appendix [C] 


IV. DISCUSSION 

This analysis has potential to be useful in many ap¬ 
plications. Suppose we have a system where data ar- 
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rives in the form of a list of interactions between a fi¬ 
nite number of agents. This model provides a framework 
for interpreting such data. A sample of say, n inter¬ 
actions, can be thought of as a network with n edges, 
all of which are placed according to some hidden fitness 
variable which the present model makes no assumptions 
about. It should also be noted that the assumption in our 
model that interactions are pairwise can easily be gener¬ 
alised so that any given number of nodes may be active 
at each time-step (i.e. a hypergraph). We have shown 
the impact of the edge density (which can be interpreted 
as the size of the sample) on the degree distribution and 
that the effect of bursty, memory driven, behaviour is 
seen mostly in the nodes which have low intrinsic fitness. 
We also suggested a method to recover the hidden fitness 
distribution from the data. We note that the variability 
with edge density is very similar to the problem of time 
varying networks discussed in [s^ although in this work 
the authors focus on the issue of not counting multiple 
edges more than once (something the present analysis ig¬ 
nores) and are not concerned with the aggregate network 
after a long time when it reaches a high density. The re¬ 
sults of this paper have shown that the effect at high 
densities is profound and can be significantly altered by 
the addition of memory. 

The motivation for this work was the potential applica¬ 
bility to two specific areas of data analysis: online social 
interactions (e.g. Twitter) and the data received from 
fMRI scans, in particular when the cortical regions of the 
brain are considered as nodes and activity may transmit 
from one region to another (see for example HlSi). 
Both systems are known to exhibit bursty activity. In 
the case of human communication this is brought about 
by the reciprocation of messages. Empirical studies [l^ 
have found the power-law exponent in the inter-event 
time distribution to be between —1 and —2. This be¬ 
haviour can be recreated by our model but it requires 
negative values of x and for /(O) in Eq.([T|) to be defined 
separately. Further work would therefore be required to 
make this analysis directly applicable. Less is known 
about why burstiness occurs in the human brain but it 
is likely because of some kind of feedback mechanism 
[1^. Recovering a fitness distribution using the dynamic 
model with memory in either of these situations would 
effectively amount to filtering out the effects of these in¬ 
ternal feedback mechanisms and exposing the external 
influences on the system. 

Our final remark is a mention of the burst pattern re¬ 
sult observed in the activity of nodes (Fig.(l3l)). In many 
studies of burstiness (such as [s^ and [sj) the power-law 
inter-event time distribution is included as an a priori as¬ 
sumption in the description of the model. We have shown 
that this pattern can emerge from a simpler, lower-level 
process, suggesting that there could be a universal reason 
why such patterns are observed so frequently in complex 
systems. The relationship between this result and the 
well studied SOC models needs to be established in or¬ 
der to move towards an analytical understanding of both 


phenomena, hopefully broadening this model to a wide 
range of universality classes, and potentially extending 
its applicability. 
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Appendix A: Solution for the inter-event time 
distribution 


We first find pk, the probability that kt = k for a 
randomly selected t € N. For the general event proba¬ 
bility kernel / we find a recursion relation relating pk to 
Pk-i- We then continue by examining only the special 
case where 


f{k) 


k + X 
M + X + € 


(Al) 


for constants x and e and find the exact solution for pk- 
From this result we approximate the probability that the 
time between two events is exactly r iterations of the 
model. 


1. Memory size distribution 



1 Added | 

Probability 

1 0 

f{kt) 1 - fikt) 

1 

ktIM 

fct+i = kt kt-\-i = kt 1 

0 

1 - kt/M 

kt+i = kt + 1 kt+i = kt 


TABLE 1. At each iteration A* € 0,1 is added to the mem¬ 
ory while at the same time a randomly selected entry will be 
removed. We show the probabilities of these events and how 
each possible combination changes kt. 


Table U shows the possible events which can happen 
regarding the addition and deletion of Is in the memory. 
All possible transitions of k^ are brought together in the 
following master equation which describes the evolution 







of Pk{t): 


2 . Inter-event time distribution 


Pk{t) = 


1 - 


k-l 

IT 


f{k)pk-i{t - 1 ) 




Pk{t - 1) (A2) 


+ [1 “ /(^ + 1)] ( ) Pk+i{t - 1) 


As t —>■ oo the distribution will converge towards a time- 
invariant distribution, pk, described by 


1 - 


k-l 

M 


— f{k) — 2 —f{k) 
M ^ ^ M ^ ^ 


f{k)pk-i - 

+ [l-fik + l)] 


Pk 


k + 1 
M 


Pk+i = 0. 
(A3) 


This second-order recurrence relation reduces to a first- 
order recurrence relation with the introduction of 

H{k) = ^Al-f{k)]pk (A4) 


and 


F{k)=f{k) 



(AS) 


using the condition that p-i = 0 in Ea. (IA3l) we see that 
F{ 0 ) = FI{ 1 ) and also that Ea. (IA3l) becomes 


F{k)-F{k-l) = H{k + l)-H{k). (A6) 


Clearly then F(k — 1) = F[{k), and so pk obeys the first- 
order recurrence equation: 


Pk 


fM-l-k\ 

\fik-l)] 

V k ) 

U-f{k)\ 


Pk-i- 


(A7) 


Writing pi in terms of po, then p 2 in terms of pi and so 
on, we can express Eq. (lATl) as 


Pk 


Poll 


(M -l-i\ 

I 

■ 

1 * ) 

.!-/(*). 


(AS) 


Here we derive H,-, the probability that a randomly 
selected interval has size r. Suppose we select a random 
Xt from the sequence. For X-t to be 0 and belong to 
an interval of length r it must be preceded by a string 
composed of a 1 followed by r' Os, and it must be the first 
0 in a sequence of r — r' Os followed by another 1. The 
variable t' can be any integer from 1 to r and we need 
to sum the probabilities of each possibility to arrive at 
the probability that At is a 0 at any location within an 
interval of size r. Expressed symbolically, the previous 
sentence is equivalent to 


Tllrit) = fikt-Af{kt+r-r' + l) n 1 

t' — 1 i—t—r' + l 

(AlO) 

where H,- (t) is the probability that the interval containing 
At has length r. The multiplication by r on the left hand 
side comes from the fact that there are r choices of At 
which belong to this interval. We make the following 
approximations and coarsening of the model: 


1. We assume that M is large and also consider only 
the values of k large enough for Stirling’s approx¬ 
imation to be a valid to approximate the Gamma 
functions in Ea. (IA9l) . We further limit our atten¬ 
tion to those values of k for which M » k and 
get 


Pk 


n i-te 


1 - 


M 


Po 

Tix)‘ 


ix-l 


Po 

Tix)‘ 


I x-l 


(All) 


2. We choose M >> S,e which means f(kt) ~ kt/M. 
More importantly, if we say that P{f{kt) = </)) is 
the probability that f{kt) = f ior a. randomly se¬ 
lected t G N then from Ea. dAlip we have 

P{f{h) = 4>)^^AM4>Y-\ (A12) 

r(a;) 

3. Over short time periods, changes to kt will be small. 
In other words, locally the system behaves as a 
Bernoulli process with success probability given by 
(/). This allows Ea. dAlOp to be approximated by 

Tlr{t)= f{hf[l-f{kt)Y- (A13) 

When t G N is selected randomly this is equivalent 
to 


We choose at this point to investigate only the linear 
case with /(fc) given by Ea. dAlD . In this instance the 
translation property of the Gamma function (a;r(x) = 
r(a: -I- I)) can be used and we arrive at 

r(M-I)r(M-fc + e)r(fc + 3:) 

T{M+ e)T{M-k-l)T{x)k\ ^ ^ 

giving the probability distribution for the number of Is 
in the memory at any given time. 


P{T\f{kt) = ct>) = A{l-4^Y. (A14) 


4. We approximate ^ by a continuous variable. 


The time-independent solution to the inter-event time 
distribution is found by solving 


n. = f P{f{kt) = ci>)P{T\f{kt) = f)df 

' fr+Yi-fVdf. 

Jo 


(A15) 


r(a:) 
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Thus we find that the inter-event time distribution is 
given by a Beta function IIt- ^ B{t -|- 1, x -I- 2)which for 
large values of r obeys 

(A16) 


Appendix B: Solution for the network degree 
distribution 


1. Dynamic model without memory 


For each positive integer k we want to know the num¬ 
ber of nodes Uk that have degree fc as a function of the 
fitness distribution /o(x), as well as the parameters N and 
E. This quantity is the expectation of the degree distri¬ 
bution; the mean of the ensemble of networks generated 
in this way. Letting t be the number of iterations and 
nfc(x, t) be the expectation of the number of nodes of de¬ 
gree k with fitness x at time t, we can write down the 
rate of change 


dnk{x,t) 

di 


j^[nk-i{x,t) -nk{x,t)] 

+ 4[(fc + l)nfc+i(x,t) - knk{x,t)]. 


(Bl) 


The first two terms on the right hand side account for 
the creation and destruction (respectively) of nodes of 
degree k which occurs when an edge is attached to a 
node of degree k — 1 (creation) or to a node of degree 
k (destruction). The last two terms on the right hand 
side account for the creation and destruction of nodes of 
degree k which occurs when the oldest edge is removed 
from a node of degree A: -|- 1 (creation) or removed from 
a node of degree k (destruction). We have assumed here 
that the ages of edges adjacent to a node are not cor¬ 
related, thus the process of removing the oldest edge is 
approximately the same as removing a randomly selected 
one. After a large number of iterations the system will be 
in equilibrium, nk{x,t) = nk{x), and the left hand side 
will be equal to zero. Using a similar method to that 
found in we solve Ea. dBlI) by introducing 

2x 1 

H{k,x) = , nk{x) and G{k,x) = —knk{x). (B2) 
N{x) E 

Ea. dBlI) now becomes 


G{k + l)-G{k) = H{k)-H{k-l). (B3) 


By summing Eas. dB3D over all fc > 1 we find that 
G(0, x) = H{1, x) and consequently G(k, x) = iL(fc—1, x) 
for all fc > 1, solving this leads to 


nk{x) 



k ^ 

^no(x). 


(B4) 


To find no we consider fV(x), the expected number of 
nodes of fitness x, 


where I = 2E/N{x). The conditional probability 
P{k\x) = nk{x)/N{x) is found by combining Ea. dBdl) 
and Ea. dBSI) to get 

P(k\x) = ^(lx)’^e-^^. (B6) 

k\ 

Thus isolating only the nodes which have fitness exactly 
equal to x we find a Poission degree distribution. Inter¬ 
estingly, this implies that if one was to take a sample of 
nodes which all have a similar fitness value, one would 
see a network which looks very similar to an Erdds-Renyi 
random graph. [Ea. dBfil) can also be found by more di¬ 
rect means. It can be expressed as the probability of k 
successes in 2E trials where the probability of success, 
i.e. creating an edge, is given by Eq.(|4]). P{k\x) is given 
by a binomial variable and gives the same result when 
N oo.] 

To finally reveal the fraction of nodes in the entire 
network of degree fe, pk = nk/N, we need to solve the 
integral 

poo jk pco 

Pk= p(x)P{k\x)dx = T, I p(x)x^e~^^dx. (B7) 
Jo Jo 

This is as far as a the general solution can be taken but 
the solutions for two special forms of p{x) are presented 
in Section [C] 


2. Dynamic model with memory 

The rule determining whether a node is active at any 
given time can be divided into two constituent mech¬ 
anisms: one is regarded as a reaction to one or more 
previous interactions; it is memory dependent and is re¬ 
sponsible for bursts of activity. The other is the fitness 
of the node which encompasses all the other reasons why 
a node may become active at any given time. Modifying 
the model lEg. dBlI) ') for the new kernel Eq.(l5]) we now 
have 

dnk{x, t) 

Wt ^ 

2 

jV((fc) -p {x)) + ^ ~ - (k + x)nk{x,t)\ 

+ 4[(^ + l)nfe-n(a;,t) - knk{x,t)]. 

(B8) 

As before, we set the left hand side to zero to get a dif¬ 
ference equation 

knk{x) -{k + l)nfc+i(x) = 

m[(fc - l)nk-i{x) - kukix) + xnk-i{x) - xnfc(x)], 

(B9) 

where m = {k)/{{k) + {x)). To solve this we introduce 
the generating function. 


tXJ 

^{x) = ^(/x)'=—no(x) = no{x)e^^ 


9{z,x) = ^nk{x)z^ 


(B5) 


(BIO) 
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by multiplying Eg. dBQI) by and summing over all /c > 0 
we arrive at 


Appendix C: Examples of specific fitness 
distributions 


(z-l)(l 


mz) 


dg{z,x) 

dz 


mx{z — l)g(z, x) 


0 (Bll) 


1. Gamma distribution 


which has the solution g{z, x) = [C(l — (a general 

de scrip tion of this method is described in the appendix 
of [28^1 . We find C by substituting = N{x) into 

the solution and get 


We examine in detail the possible scenario where the 
fitness of the population follows the gamma distribution 


p{x-,a,P) 


^a-l^-x/P 

/3“r(a) 


(Cl) 


g{z,x) = N{x) fy —. (B12) 

\ 1 — mz ) 

The coefficient of z^ in the expansion of the right hand 
side is nk{x), dividing this by N{x) then gives the follow¬ 
ing conditional probability which contrasts with Eq. (IB6I) 


which generalises a number of distributions that have ap¬ 
plications in social sciences including £md the expo¬ 
nential distribution. In general it has the appearance 
of an asymmetric bell curve and we consider it entirely 
likely that a system might exist where the fitness values 
are clustered around the mean in this way. 


P{k\x) 


X + k — 1 
k 


(1 — mYm^. 


(B13) 


a. Dynamic model without memory 


As {x) —>■ oo, P[k\x) tends towards the Poisson distri¬ 
bution with the same mean we had in Ea. (IB6l) . This is 
expected since in this limit the attachment kernel for any 
given node will be dominated by its fitness. Let pk be 
the fraction of nodes with degree k and is the integral of 
the product of p{x) and the right hand side of Eg. (IB 131) 
over all possible values of x 

Pk = -rr / x{x+l)...{x+k-l){l-mYp{x)dx. (B14) 

Jo 

We can simplify the integral by multiplying out all the 
brackets which contain x, this gives 


We solve Eo. dBTI) to find the degree distribution. The 
integral becomes 

jk roc 

"“‘MimoL (C2) 


By applying the change of variables y = {I + l//3)x the 
integral becomes the product of a gamma function and 
some other factors. We arrive at 


Pk 


Yr{k -I- a) 

/3“(/ -f 1/l3Y+°^T{a)kV 


(C3) 


For large values of k this solution becomes a gamma dis¬ 
tribution Pk ^ p(fc; a, log(l -I- 


k ^ p oo 

Pk = ^'^c{k,n) / a;"(l - m)“p(a;)dx. (B15) 

n=0 •^0 


b. Dynamic model with memory 


Here c(fc, n) denotes the unsigned Stirling numbers of the 
first kind (the number of permutations of k symbols that 
have exactly n cycles 0), since an explicit expression for 
these is not known, Eq. (IB15I) is only useful at small values 
of k. For large k we examine the generating function 

OO 

G(z)=^PfeZ^ (B16) 

k=0 


We substitute Eo. dCll) into Eg. dBlSI) and applying the 
change of variables; y = x[{l/(i) — log(l — m)], we can 
again take a gamma function out as a factor, leaving 


Pk 


^ c(fc, n)T{n+a) 
k\P°‘T{a) 



1 

1 


— {n+a) 


(C4) 

Substituting Ea. dCll) into Eo. dBlTI) and solving the inte¬ 
gral we arrive at 


It follows from Eo. dB12p that 

G{z) = / p{x) —^ dx. (B17) 

Jo \l-mzj 

When the fitness parameter is the same for all nodes, 
Xi = a, the model reduces to that studied in [2^. Substi¬ 
tuting p{x) = 5{x —a) into Ea. (jB17l) yields the expected 
result. 


G{z) 


1 -I- log 


/I — mz\ 
\ 1 — m / 


— a 


(C5) 


As z —> 1, the logarithm in the above expression ap¬ 
proaches 0 making the approximation log(A) « A — I 
appropriate to use. For z « I we have 


G(z) 


1 — m 
I — mz 


OL^ 


(C6) 
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which can be expanded to recover the power series. 
Equating the coefficients of the expansion with those of 
Ea. (IB16l) we find 


fy-mf (C7) 

For large k this is 


log(l/m)y 

where c = [(1 — m)/log(l/m)]“^ is a normalising con¬ 
stant. As the mean (x) = a/3 tends towards 0 the dis¬ 
tribution tends towards a power-law with exponent — 1 . 
This represents a scenario where the majority of actions 
are in fact reactions to previous events. 


Pk 


(1 - m)“^ 

r(a/3) ' 




2. Power-law distribution 


Suppose fitness is distributed according to the follow¬ 
ing power-law 


p(x, Xjnin; T) 



which has the mean 


{x) 


7-1 

^min ■ 

7-2 


if X> Xniin 
if X < Xniin 

(C9) 

(CIO) 


a. Dynamic model without memory 


To find the degree distribution we substitute Ea. (IC9D 
into Ea. (IB7() . giving 


Pk = 


x]-2k\ 


^k-^^-lxdx. 


(Cll) 


Using the substitution y = lx, the integral can be ex¬ 
pressed using the upper incomplete gamma function (see 
(40l| l defined as r(M, v) = v"~^e~'^dv for real numbers 

u and V. We can also simplify the solution by combining 
the parameters using 


A = lx 


min 


273(7 - 2 ) 

/V(7-l) 


and we get 


Pk 


(7 - 

k\ 


T{k — 7 -I- 1, A). 


(C12) 


(C13) 


Notice that all choices of imin yield the same result. This 
is not unexpected; the scale-invariance of the power law 
distribution means that generating a random fitness xt 


using Ea. (IC9l) is eqivalent to generating ^ from p(^, 1, 7 ) 
and taking Xi = ^ccmin as the fitness value. Substituting 
this fitness value into Eq. ([3]) we see that Xmin is no longer 
present. 

It is also informative to solve Eq. (ICllI) for integer val¬ 
ues of 7 . We first express the part inside the integral as 
a derivative 




(-l)fc-7 


dk-y^-xy 


y^l 


(CM) 


before performing the integration with respect to x. 
Since 


and 


e-^^dx 


exp{-Xminy) 

y 


(C15) 


dy"^ 


exp{-Xniiny) 


= (-l)”exp(-Xn 


Til 

ay) ^ ^ “^min 
s =0 


y 


(C16) 


for n G N, for integer values of 7 we arrive at 


(7 l)(^^min)^ axp( IXuun) (^^min) 


Pk = 


k{k — 1 )... (fc — 7 -I- 1 ) 


E 

s =0 


5 ! 


which, using Eq. (|C12p simplifies to 

(7 - l)A^-^e-^ ^ A® 

^ E w 


fc(fc-l)...(/c -7 + l) ^ s! 


(C17) 

(CIS) 


It is now easy to see that the degree distribution has a 
power-law tail (see Fig. (Belli. 


b. Dynamic model with memory 

First we substitute Ea. (IC9l) into Eg. dBlSI) . We intro¬ 
duce L = — log(l — m), then, by applying a the change 
of variables y = Lx, we can factorise out an incomplete 
gamma function. This gives the following exact solution 
for the degree distribution 

Pk = E n)L^“”“^r(n - 7 -f 1, LXnCm)- 

f^Kain n=0 

(C19) 

The parameter Xmin, which was absent in Eq. (IC13I) , now 
controls the overall effect of fitness in proportion to mem¬ 
ory. For large values of k we solve Ea. (IB17p to find 

G{z) = (7- l)r[l -7, $(a;min,z)][$(a;min,^)]^“^ (C20) 

where 

■ (C21) 

1 — m J 






.log 
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Using the approximation log(X) « X — 1 as z approaches 
1 we find 


G(z) « (7 - l)'>'"^r(l - - z)^-^ (C22) 

where A is given by Ea. (IC12p . For non-integer values of 7 
this can be expanded and the coefficients of the expansion 
can be equated with Ea. (IB16l) . We see that 


Pk 


(7 — 1)7!^ ^r(fc —7-1-1) 
r(A: + l) ■ 


(C23) 


The k dependence exists in the form of the ratio of two 
gamma functions so asymptotically pfc ^ . It is worth 

remarking that the power-law exponent in the fitness dis¬ 
tribution is the same exponent found in the degree dis¬ 
tribution and is not affected by the choice of the other 
parameters N, E or ccmin as can be seen in Fig. (l4d)l . 


3. Step function distribution 

For practical purposes it is useful to have a general 
method of inferring a fitness distribution from a degree 
distribution. We suggest one such approach here and 
focus exclusively on the case where memory effects are 
present. 

By assuming the fitness distribution has the form of a 
step function (otherwise know as a staircase function) we 
can minimise the error between the theoretical prediction 
and the observed data by adjusting the height of each 
step (or stair). Suppose we have a vector of parameters 
a = [ao,ai, ...,aj], we then define the distribution as 


distribution p = [po,Pi, ■■■,Pk] where K is the largest de¬ 
gree. The degree distribution p, the generating function 
as given by Ea. dBlfip . fitness parameters a, and the gen¬ 
erating function as given by Ea. (IC26D are all connected 
by the following expression: 

Zp = Wa. (C27) 

Here Z is a I x K matrix whose (z, fe)th entry is = 
z^Gi and W is a IX J matrix whose (z, j)th entry is given 
by 


[(1 — to)/( 1 — TOZi_l)]^ — 1 / 1 — TO 

log[(l - to)/( 1 - TOZi_i)] \l-mz^-i) 

(C28) 

While Ea. (IC27l) appears to be a simple linear algebra 
problem, it is complicated by the fact that to depends 
on (x), which is only known after a choice of a has been 
made, therefore IF is a function of both a and S. This 
does however provide a neat way to formally present the 
problem: We choose J < K to prevent having more pa¬ 
rameters than datapoints and solve 

p{x) = p{x,a,S) (C29) 

such that 

||Zp — IFa|| = min IjZp — IFa|| . (C30) 


' oo for 0 < X < ^ 
oi for 5 < X < 2(5 


p(x,a,(5) 


aj for jS < X < (j -|- 1)(5 


(C24) 


. aj for JS < X < {J + 1)(5 

where X]/=o The mean fitness is 

(x) = av (C25) 

where v = (J/2)[l, 3,..., 2 J -|- 1]. Substituting p into 
Ea. (IB17l) we get 



[(1 — to )/(1 — TOZ )]^ — 1 ^ / 1 — to \^* 

log[(l — m)/(l — mz)] ^ ° \ 1 — mz J 

(C26) 


We can generate (randomly or systematically) a vector 
of values z = [zq, Zi,zj] at which the generating func¬ 
tion can be evaluated. The empirical data is the degree 
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